21 research outputs found
nsroot: Minimalist Process Isolation Tool Implemented With Linux Namespaces
Data analyses in the life sciences are moving from tools run on a personal
computer to services run on large computing platforms. This creates a need to
package tools and dependencies for easy installation, configuration and
deployment on distributed platforms. In addition, for secure execution there is
a need for process isolation on a shared platform. Existing virtual machine and
container technologies are often more complex than traditional Unix utilities,
like chroot, and often require root privileges in order to set up or use. This
is especially challenging on HPC systems where users typically do not have root
access. We therefore present nsroot, a lightweight Linux namespaces based
process isolation tool. It allows restricting the runtime environment of data
analysis tools that may not have been designed with security as a top priority,
in order to reduce the risk and consequences of security breaches, without
requiring any special privileges. The codebase of nsroot is small, and it
provides a command line interface similar to chroot. It can be used on all
Linux kernels that implement user namespaces. In addition, we propose combining
nsroot with the AppImage format for secure execution of packaged applications.
nsroot is open sourced and available at: https://github.com/uit-no/nsroo
nsroot: Minimalist process isolation tool implemented with Linux namespaces
services run on large computing platforms.. This creates a need to package tools and dependencies for easy installation,, configuration and deployment on distributed platforms.. In addition,, for secure execution there is a need for process isolation on a shared platform.. Existing virtual machine and container technologies are often more complex than trad itional Unix utilities,, like chroot,, and often require root privileges in order to set up or use.. This is especially challenging on HPC systems where users typically do not have root access.. We therefore present nsroot,, a lightweight Linux namespaces based process isolation tool.. It allows restricting the runtime environment of data analysis tools that may not have been designed with security as a top priority,, in order to reduce the risk and consequences of security breaches,, without requiring any special privileges.. The codebase of nsroot is small,, and it provides a command line interface similar to chroot.. It can be used on all Linux kernels that implement user namespaces.. In addition,, we propose combining nsroot with the AppImage format for secure execu tion of packaged applications.. nsroot is open sourced and available at:: https://github.com/uit-no/nsroot
Kvik: Interactive exploration of genomic data from the NOWAC postgenome biobank
We have developed Kvik, a system for interactive exploration of genomicdata from the Norwegian Women and Cancer (NOWAC) postgenomebiobank. The goal of the NOWAC study is to understand the dynamicsof carcinogenesis through multi-level functional analyses of transcriptomicsand epigenetics using blood and tissue samples. Kvik provides a tool forexploring gene expression data, incorporating both statistical analysis andinteractive visualizations in a single system. The tool is open-sourced atgithub.com/fjukstad/kvik
Teaching Electronics and Programming in Norwegian Schools Using the air:bit Sensor Kit
We describe lessons learned from using the air:bit project to introduce more
than 150 students in the Norwegian upper secondary school to computer
programming, engineering and environmental sciences. In the air:bit project,
students build and code a portable air quality sensor kits, and use their
air:bit to collect data to investigate patterns in air quality in their local
environment. When the project ended students had collected more than 400,000
measurements with their air:bit kits, and could describe local patterns in air
quality. Students participate in all parts of the project, from soldering
components and programming the sensors, to analyzing the air quality
measurements. We conducted a survey after the project and describe our lessons
learned from the project. The results show that the project successfully taught
the students fundamental concepts in computer programming, electronics, and the
scientific method. In addition, all the participating teachers reported that
their students had showed good learning outcomes
Transcription factor PAX6 as a novel prognostic factor and putative tumour suppressor in non-small cell lung cancer
Source at https://doi.org/10.1038/s41598-018-23417-z. Licensed CC BY-NC-ND 4.0.Lung cancer is the leading cause of cancer deaths. Novel predictive biomarkers are needed to improve treatment selection and more accurate prognostication. PAX6 is a transcription factor with a proposed tumour suppressor function. Immunohistochemical staining was performed on tissue microarrays from 335 non-small cell lung cancer (NSCLC) patients for PAX6. Multivariate analyses of clinico-pathological variables and disease-specific survival (DSS) was carried out, and phenotypic changes of two NSCLC cell lines with knockdown of PAX6 were characterized. While PAX6 expression was only associated with a trend of better disease-specific survival (DSS) (p = 0.10), the pN+ subgroup (N = 103) showed significant correlation between high PAX6 expression and longer DSS (p = 0.022). Median survival for pN + patients with high PAX6 expression was 127.4 months, versus 22.9 months for patients with low PAX6 expression. In NCI-H661 cells, knockdown of PAX6 strongly activated serum-stimulated migration. In NCI-H460 cells, PAX6 knockdown activated anchorage-independent growth. We did not observe any significant effect of PAX6 on proliferation in either of cell lines. Our findings strongly support the proposition of PAX6 as a valid and positive prognostic marker in NSCLC in node-positive patients. There is a need for further studies, which should provide mechanistical explanation for the role of PAX6 in NSCLC
Kvik : interactive exploration of genomic data from the NOWAC postgenome biobank
Recent technological advances provide large amounts of data for epidemiological analyses that can provide novel insights in the dynamics of carcinogenesis. These analyses are often performed without prior hypothesis and therefore require an exploratory approach. Realizing exploratory analysis requires the development of new systems that provide interactive exploration and visualization of large-scale scientific datasets.
This thesis presents Kvik, an interactive system for exploring the dynamics of carcinogenesis through integrated studies of biological pathways and genomic data. Kvik is designed as a three-tiered application, an architecture that is commonly used for peta-scale applications. It provides researchers with a lightweight web application for navigating through biological pathways from the KEGG database integrated with genomic data from the NOWAC postgenome biobank.
In collaboration with researchers from the NOWAC systems epidemiology
group, we have described the requirements for such a system, and by using an
iterative approach we implemented Kvik through small development cycles,
involving the end-users in the development process. Throughout the project we
have gained valuable interdisciplinary experience in developing systems for use
in explorative analysis of carcinogenesis.
Through an evaluation of the exploration tasks and workflow of an end-user, we
demonstrate that Kvik has the capability of interactive exploration of genomic
data and biological pathways.
We believe Kvik is important to enable novel discoveries from
the data produced in the NOWAC systems epidemiology project. It provides
epidemiology researchers with access to powerful compute and storage resources
enabling the use of advanced statistical methods for the analysis. Finally, from
our experiences in developing Kvik, we provide use cases and requirements for
future analysis, computation and storage systems developed in our
research group and by others
Toward Reproducible Analysis and Exploration of High-Throughput Biological Datasets
This dissertation argues that we can develop unified systems for reproducible exploration and analysis of high-throughput biological datasets. We propose an approach, Small Modular Entities (SME), that orchestrates the execution of analysis pipelines and data exploration applications. We realize SMEs using software container technologies together with well-defined interfaces, configuration, and orchestration. It simplifies the development of such applications, and provides detailed information to reproduce the analyses
A Review of Scalable Bioinformatics Pipelines
Abstract Scalability is increasingly important for bioinformatics analysis services, since these must handle larger datasets, more jobs, and more users. The pipelines used to implement analyses must therefore scale with respect to the resources on a single compute node, the number of nodes on a cluster, and also to cost-performance. Here, we survey several scalable bioinformatics pipelines and compare their design and their use of underlying frameworks and infrastructures. We also discuss current trends for bioinformatics pipeline development
Reproducible Data Analysis Pipelines for Precision Medicine
Precision medicine brings the promise of more precise diagnosis and individualized therapeutic strategies from analyzing a cancer’s genomic signature. Technologies such as high-throughput sequencing enable cheaper data collection at higher speed, but rely on modern data analysis platforms to extract knowledge from these high dimensional datasets. Since this is a rapidly advancing field, new diagnoses and therapies often require tailoring of the analysis. These pipelines are therefore developed iteratively, continuously modifying analysis parameters before arriving at the final results. To enable reproducible results it is important to record all these modifications and decisions made during the analysis process."/jats:p""jats:p"We built a system, "jats:monospace"walrus"/jats:monospace", to support reproducible analyses for iteratively developed analysis pipelines. The approach is based on our experiences developing and using deep analysis pipelines to provide insights and recommendations for treatment in an actual breast cancer case. We designed "jats:monospace"walrus"/jats:monospace" for the single servers or small compute clusters typically available for novel treatments in the clinical setting. "jats:monospace"walrus"/jats:monospace" leverages software containers to provide reproducible execution environments, and integrates with modern version control systems to capture provenance of data and pipeline parameters."/jats:p""jats:p"We have used "jats:monospace"walrus"/jats:monospace" to analyze a patient’s primary tumor and adjacent normal tissue, including subsequent metastatic lesions. Although we have used "jats:monospace"walrus"/jats:monospace" for specialized analyses of whole-exome sequencing datasets, it is a general data analysis tool that can be applied in a variety of scientific disciplines. We have open sourced "jats:monospace"walrus"/jats:monospace" along with example data analysis pipelines at "jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/uit-bdps/walrus""github.com/uit-bdps/walrus."/jats:ext-lin
Kvik: three-tier data exploration tools for flexible analysis of genomic data in epidemiological studies
Published version. Source at https://doi.org/10.12688/f1000research.6238.1.Kvik is an open-source system that we developed for explorative analysis of functional genomics data from large epidemiological studies. Creating such studies requires a significant amount of time and resources. It is therefore usual to reuse the data from one study for several research projects. Often each project requires implementing new analysis code, integration with specific knowledge bases, and specific visualizations. Existing data exploration tools do not provide all the required functionality for such multi-study data exploration. We have therefore developed the Kvik framework which makes it easy to implement specialized data exploration tools for specific projects. Applications in Kvik follow the three-tier architecture commonly used in web applications, with REST interfaces between the tiers. This makes it easy to adapt the applications to new statistical analyses, metadata, and visualizations. Kvik uses R to perform on-demand data analyses when researchers explore the data. In this note, we describe how we used Kvik to develop the Kvik Pathways application to explore gene expression data from healthy women with high and low plasma ratios of essential fatty acids using biological pathway visualizations. Researchers interact with Kvik Pathways through a web application that uses the JavaScript libraries Cytoscape.js and D3. We use Docker containers to make deployment of Kvik Pathways simple